AITopics | grammatical correctness

Collaborating Authors

grammatical correctness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sensitivity of Small Language Models to Fine-tuning Data Contamination

Scaria, Nicy, Kennedy, Silvester John Joseph, Subramani, Deepak

arXiv.org Artificial IntelligenceNov-11-2025

Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data contamination during instruction tuning remains poorly understood. We systematically investigate the contamination sensitivity of 23 SLMs (270M to 4B parameters) across multiple model families by measuring susceptibility to syntactic and semantic transformation types during instruction tuning: syntactic transformations (character and word reversal) and semantic transformations (irrelevant and counterfactual responses), each applied at contamination levels of 25\%, 50\%, 75\%, and 100\%. Our results reveal fundamental asymmetries in vulnerability patterns: syntactic transformations cause catastrophic performance degradation, with character reversal producing near-complete failure across all models regardless of size or family, while semantic transformations demonstrate distinct threshold behaviors and greater resilience in core linguistic capabilities. Critically, we discover a ``\textit{capability curse}" where larger, more capable models become more susceptible to learning semantic corruptions, effectively following harmful instructions more readily, while our analysis of base versus instruction-tuned variants reveals that alignment provides inconsistent robustness benefits, sometimes even reducing resilience. Our work establishes three core contributions: (1) empirical evidence of SLMs' disproportionate vulnerability to syntactic pattern contamination, (2) identification of asymmetric sensitivity patterns between syntactic and semantic transformations, and (3) systematic evaluation protocols for contamination robustness assessment. These findings have immediate deployment implications, suggesting that current robustness assumptions may not hold for smaller models and highlighting the need for contamination-aware training protocols.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.06763

Country:

Europe (0.68)
Asia (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Differentially-private text generation degrades output language quality

Çano, Erion, Habernal, Ivan

arXiv.org Artificial IntelligenceSep-16-2025

Ensuring user privacy by synthesizing data from large language models (LLMs) tuned under differential privacy (DP) has become popular recently. However, the impact of DP fine-tuned LLMs on the quality of the language and the utility of the texts they produce has not been investigated. In this work, we tune five LLMs with three corpora under four levels of privacy and assess the length, the grammatical correctness, and the lexical diversity of the text outputs they produce. We also probe the utility of the synthetic outputs in downstream classification tasks such as book genre recognition based on book descriptions and cause of death recognition based on verbal autopsies. The results indicate that LLMs tuned under stronger privacy constrains produce texts that are shorter by at least 77 %, that are less grammatically correct by at least 9 %, and are less diverse by at least 10 % in bi-gram diversity. Furthermore, the accuracy they reach in downstream classification tasks decreases, which might be detrimental to the usefulness of the generated synthetic data.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.11176

Country:

Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?

Scaria, Nicy, Kennedy, Silvester John Joseph, Subramani, Deepak

arXiv.org Artificial IntelligenceJul-1-2024

Small Language Models (SLMs) are generally considered to be more compact versions of large language models (LLMs), typically having fewer than 7 billion parameters. This study investigates the ability of small language models to learn, retain, and subsequently eliminate noise that is typically not found on the internet, where most pretraining datasets are sourced. For this, four pre-trained SLMs were utilized: Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The models were instruction-tuned without noise and tested for task execution with in-context learning. Afterward, noise patterns were introduced to evaluate the models' learning and unlearning capabilities. We evaluated the models' performance at various training levels. Phi consistently excelled with word-level noise but performed the worst with character-level noise. Despite being the smallest with approximately 1 billion parameters, Olmo performed consistently well on tasks.

accuracy, dataset, noise, (14 more...)

arXiv.org Artificial Intelligence

2407.00996

Country:

Europe > France (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Privacy Concerns in Chatbot Interactions: When to Trust and When to Worry

Saglam, Rahime Belen, Nurse, Jason R. C., Hodges, Duncan

arXiv.org Artificial IntelligenceJul-8-2021

Through advances in their conversational abilities, chatbots have started to request and process an increasing variety of sensitive personal information. The accurate disclosure of sensitive information is essential where it is used to provide advice and support to users in the healthcare and finance sectors. In this study, we explore users' concerns regarding factors associated with the use of sensitive data by chatbot providers. We surveyed a representative sample of 491 British citizens. Our results show that the user concerns focus on deleting personal information and concerns about their data's inappropriate use. We also identified that individuals were concerned about losing control over their data after a conversation with conversational agents. We found no effect from a user's gender or education but did find an effect from the user's age, with those over 45 being more concerned than those under 45. We also considered the factors that engender trust in a chatbot. Our respondents' primary focus was on the chatbot's technical elements, with factors such as the response quality being identified as the most critical factor. We again found no effect from the user's gender or education level; however, when we considered some social factors (e.g. avatars or perceived 'friendliness'), we found those under 45 years old rated these as more important than those over 45. The paper concludes with a discussion of these results within the context of designing inclusive, digital systems that support a wide range of users.

chatbot, participant, privacy concern, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-78642-7_53

2107.03959

Country: Europe > United Kingdom > England > Kent (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Adapting a Language Model for Controlled Affective Text Generation

Singh, Ishika, Barkati, Ahsan, Goswamy, Tushar, Modi, Ashutosh

arXiv.org Artificial IntelligenceNov-8-2020

Human use language not just to convey information but also to express their inner feelings and mental states. In this work, we adapt the state-of-the-art language generation models to generate affective (emotional) text. We posit a model capable of generating affect-driven and topic focused sentences without losing grammatical correctness as the affect intensity increases. We propose to incorporate emotion as prior for the probabilistic state-of-the-art text generation model such as GPT-2. The model gives a user the flexibility to control the category and intensity of emotion as well as the topic of the generated text. Previous attempts at modelling fine-grained emotions fall out on grammatical correctness at extreme intensities, but our model is resilient to this and delivers robust results at all intensities. We conduct automated evaluations and human studies to test the performance of our model, and provide a detailed comparison of the results with other models. In all evaluations, our model outperforms existing affective text generation models.

emotion, grammatical correctness, intensity, (12 more...)

arXiv.org Artificial Intelligence

2011.04

Country:

North America > United States (0.94)
Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback